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Abstract 

We consider the source-channel separation architecture for lossy source coding in general communication 
networks. It is shown that the separation approach is optimal in two general scenarios, and is approximately optimal in 
a third scenario. The two general scenarios for which separation is optimal complement each other: the first scenario is 
when the memoryless sources at source nodes are arbitrarily correlated, each of which is to be reconstructed at possibly 
multiple destinations within certain distortions, but the channels in this network are synchronized, orthogonal and 
memoryless point-to-point channels; the second scenario is when the memoryless sources are mutually independent, 
each of which is to be reconstructed only at one destination within a certain distortion, but the channels are general, 
including multi-user channels such as multiple access, broadcast, interference and relay channels, possibly with 
feedback. The third general scenario, for which we demonstrate approximate optimality of source-channel separation, 
relaxes the second scenario by allowing each source to be reconstructed at multiple destinations. For this case, the 
loss from optimality by using the separation approach can be upper-bounded when the "difference" distortion measure 
is taken, and in the special case of quadratic distortion measure, this leads to universal constant bounds. 

These results are shown without explicitly characterizing the achievable joint source-channel coding distortion 
region or the achievable separation-based coding distortion region. Such an approach of identifying properties without 
explicit individual component solutions may lead to further insights into network information theory problems. 
Furthermore, for the first general scenario, the extracted pure network source-coding problem has to incorporate a 
large number of rounds of user interactions and the corresponding causality constraints, which suggests a distinct 
research direction into interactive network source coding that has not received much attention in the literature. 
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I. Introduction 

Shannon's source-channel separation theorem asserts that there is no essential loss in point-to-point 
communication systems, when the source coding component and channel coding component are designed 
and operated separately [[11. This separation architecture simplifies the overall communication system 
tremendously, because the decoupled subsystems are much easier to design and implement, with the 
codeword index as the only interface between the source coding component and the channel coding 
component. Unfortunately, it has been shown that the separation approach is not optimal in very simple 
multiuser scenarios (e.g., [0), which suggests that the optimality of source-channel separation may not hold 
beyond the conventional point-to-point case. 

Because of the clear benefits of the source-channel separation architecture, it is important to understand 
the optimality issue better. In this work, we seek to answer the following sequence of questions: is there a 
general class of multiuser communication systems for which: 

• The separation approach is optimal? 

• If the answer to the first question is negative, then is the separation approach at least approximately 
optimal? 

The difficulty in answering these questions lies in the fact that in most multiuser communication scenarios, 
we do not have explicit characterizations of the rate-distortion regions, the channel capacity regions, or the 
joint coding achievable distortion regions; however, in order to determine whether the separation approach is 
optimal, it is natural to first couple the rate-distortion region and the channel capacity region, then compare it 
with the joint coding achievable distortion region. With at least one region unknown in most cases, it seems 
utterly impossible to answer the above questions even in some of the simplest settings (e.g., communicating 
sources on an interference channel), let alone in more complex networks. In this work, we show that this 
difficulty in determining the optimality of source-channel separation can in fact be circumvented completely 
in several important settings, and the answers to the sequence of questions posed earlier are indeed positive. 

More precisely, we show that for lossy coding of memoryless sources in a network, the source-channel 
separation approach is optimal for the following two general scenarios: the first scenario, referred to as 
distributed network joint source-channel coding (DNJSCC), is when the sources are arbitrarily correlated, 
each of which is to be reconstructed at possibly multiple destinations within certain distortions, but the 
channels between any pair of node^l in this network are synchronized, orthogonal, and memoryless; the 
second scenario, referred to as joint source-channel multiple unicast with distortion (JSCMUD), is when 
the sources are mutually independent, each of which is to be reconstructed only at one destination within a 
certain distortion, but the channels can be general, including multi-user channels such as multiple access, 
broadcast, interference and relay channels, possibly with feedback. 

The third scenario is a natural extension of the second one by allowing a source to be reconstructed 
at multiple destinations; this case is referred to as joint source-channel multiple multicast with distortion 

'We do not consider channels on a hyperedge in tiiis work, iiowever tlie result can be straightforwardly extended to hypergraphs where any 
directed hyperedge is from a single transmitter node to multiple receiver nodes, whose channel outputs from this channel are identical. 
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(JSCMMD). For this scenario, the classical example of sending a Gaussian source over a Gaussian broadcast 
channel |[3l reveals that the source-channel separation approach is not optimal in general. Thus we turn our 
attention to whether the separation approach is approximately optimal, and show that under the "difference" 
distortion measure, it is indeed so in the sense that the loss from the optimum can be upper-boundecj^. In the 
important special case of quadratic distortion measure, the upper bound is at most 0.5 bit per (additional) 
user which reconstructs the same source. 

Though our results demonstrate the effectiveness of source-channel separation, a large portion of the 
difficulty in designing efficient codes remains in the extracted pure source coding problem or the extracted 
pure channel coding problem. Particularly for the DNJSCC problem, the network induces a pure source- 
coding problem with rather complex multiple-session user interactions where each session has the same rate, 
which suggests a distinct and perhaps under-studied line of research in network source coding. Some simple 
interactive settings have indeed been considered in the literature: Kaspi considered the lossy two-way source 
coding problem H (see more recent results in and in a series of papers, Orlitsky considered the 

lossless two-way problem from the perspective of worst-case vs. average-case communication complexity 
[|7]|-[l9l|. Our work suggests that it is important to consider the interactive coding problems for the case of a 
large (or infinite) number of sessions, and the interactive communication rates do not need to change from 
session to session. This class of source-coding problems naturally occur in practice, and their solutions in 
fact solve a large class of joint source channel coding problems. 

When discussing source-channel separation, it is natural to assume that the sources are independent of the 
channels, which is often the default assumption in the information theory literature, and it is also assumed 
in this work. More precisely, the assumption is that the channel output is conditionally independent of the 
sources given the channel input. This is because otherwise, even if the encoding and decoding functions 
are designed according to the separation architecture, the inherent dependence between source and channel 
will render such a separation rather meaningless even in a point-to-point setting, and joint source-channel 
coding is expected to be more effective in these scenarios. 

In the rest of this section, we give a brief overview of our proof approach, and discuss the relation 
between this work and existing work in the literature. 

A. Overview of the Indirect Approach 

As mentioned earlier, the direct proof approach of first characterizing all the relevant regions in their 
single letter forms, and then making comparison is rather difficult, thus instead, our approach is an indirect 
one. 

For the distributed network joint source-channel coding problem, the key observation is in fact extremely 
simple, which is to abstract the block channel input into a super source which can be compressed with 
certain rate. Roughly speaking, for a fixed joint source-channel code, the block channel input of this code 
can be viewed as a super source, and thus we can simulate the channel output by compressing this super 

"We will make the statement more precise in the subsequent sections. 
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source with some lossy source code, whose compression rate is upper-bounded by the channel capacity 
of the original channel. Thus any joint source-channel code can be converted into a separation-based code 
with asymptotically the same performance, which implies the optimality of source-channel separation. The 
first example in Section HI] illustrates this idea for a simple DNJSCC problem. For more complex networks 
with relay and feedback, there is an additional difficulty introduced by the interaction of the users, and this 
can be resolved by properly interleaving many copies of the original joint source-channel code (see also 
the discussion in the next subsection regarding the relation with lHOl ). 

The technique we rely on to tackle the second scenario, i.e., joint source-channel multiple unicast with 
distortion, is complementary to the previous case. Here the overall communication system is abstracted 
into an interference channel with certain individual mutual information guarantees that are implied by the 
achievable end-to-end distortions. Roughly speaking, for a fixed joint source-channel code, the deterministic 
encoding-decoding operations together with the probabilistic channel operations can be viewed as an induced 
(memory less) super block channel, where these mutual information guarantees can be utilized to provide a 
lower bound to the capacity of this super block channel. Thus the sources can be sent over this new super 
block channels, and it follows again that any joint source-channel code can be converted into a separation- 
based scheme with asymptotically the same performance. The second example in Section HI] illustrates this 
case with a small interference network. 

For the third scenario of joint source-channel multiple multicast with distortion, a similar abstraction 
yields a more complex interference channel. However, for this case, the mutual information guarantees are 
less straightforward, and they have to be derived in conjunction with the separation-based coding scheme. 
The separation architecture we use here is based on successive refinement codes for each source IfTTI . lfT2]| . 
concatenated with delivery of messages within degraded message sets lfT3l . By strategically comparing the 
rate expressions of this separation scheme and utilizing the rate loss technique [fl?]]. we are able to upper- 
bound the performance gap between the joint source-channel scheme and the separation-based scheme (see 
also discussion in the next subsection). As in earlier scenarios, this result does not rely on an explicit 
solution of either the degraded message set problem or the joint source-channel coding problem. 

B. Relation to Existing Works 

Closely related to source-channel separation in the DNJSCC problem is the problem of separation between 
network coding and channel coding, which has received considerable attention in recent years ifTOl . [flSl - 
[flTl . In general, the approach based on separating network coding and channel coding is also not optimal 
[flTl . However, a surprising result by Koetter, Effros and Medard fflOl essentially states that for general 
multicast on networks with orthogonal, synchronized and memoryless point-to-point channels {i.e., "noisy" 
graphs), there is no loss of optimality by employing such a separation. The DNJSCC problem we consider 
can be thought of as a generalization of the problem studied in [flOl to correlated sources with distortions, 
and in fact the interleaving technique in our proof is directly borrowed from ifTOl . The super source view 
and the channel simulation idea are also inherent in the proof in ifTOl . however, they are encapsulated 



4 



in the "stacked network" technique, and thus less transparent. In contrast, we explicitly apply the super 
source view and the channel simulation idea on the original network instead of the "stacked network", 
which results in a conceptually more straightforward proof, despite the more general setting of lossy source 
coding and correlated sourcegj. It is worth noting that ifTOl does not directly focus on the issue of source- 
channel separation, and the result is given from the perspective of an equivalence model. For this reason, 
the induced pure block source coding problem is not defined formally in their work. In contrast, we directly 
consider the separation issue on the original network, and the induced pure source coding problem shall be 
defined explicitly in the traditional block coding framework. 

Another approach of treating source-channel separation for various special cases in DNJSCC is to utilize 
the infinite letter expressions, instead of the single letter characterizations, of the source coding rate regions 
and the joint coding achievable distortion regions. This is also an indirect approach, since the infinite letter 
expressions are not computable, and thus do not qualify as characterizations; see [[T9ll for discussions on 
the computability issue. In ||20l . Yeung applied this indirect approach to several classical multiuser source 
coding problems on orthogonal communication channels, including the (two-user) distributed source coding 
problem ETl . the multiple description problem ll22l and the cascade communication problem [[23]|. Xiao and 
Luo [|24ll applied this approach to the problem of distributed source coding (with any number of users) on 
orthogonal multiple access channels. Though this approach is successful in these simple settings, it becomes 
rather unwieldy in more complex networks. In contrast, our proof approach can be naturally applied to more 
general networks, and consequently the result for the DNJSCC problem in this work subsumes those in 
[|20ll and flU. 

A more recent work related to the DNJSCC problem is by Han [|25|. where the problem of lossless coding 
of correlated sources on acyclic networks {i.e., no feedback) with orthogonal channels is considered; see 
also ll26l . ETl for earlier results on such networks with a single sink node. Han provided a necessary 
and sufficient condition for transmissibility in terms of source entropies and channel cut-set capacity. The 
approach took in ll25ll is to first give a necessary condition for transmissibility using conventional information 
inequalities, and then show that a separation-based approach can transmit the sources as long as this 
condition is satisfied. This proof approach appears difficult to apply on networks with feedback and lossy 
reconstructions. Interestingly, the result in ll25l establishes a source-channel separation different from the 
one we consider in this work, and this intriguing point will be revisited in the last section. 

Similar to the approach we take in the JSCMUD problem, the super-channel view was also used in ll28l . 
Il29ll where non-ergodic point-to-point communications were considered. The focus of ll28l . Il29ll is mainly 
on non-ergodic point-to-point channels, whereas our focus is on ergodic channels but in a more general 
network setting. The network scenario is quite different from the point-to-point case since it induces a 

^The benefit of the proof in 1101 is that the same proof applies when the channels and sources have continuous alphabets. Since our proof 
for the DNJSCC problem relies on the Markov Lemma which is available only for strongly typical sequences, extending our proof to such 
cases is technically non-trivial, which usually requires a delicate analysis of quantizations into discrete alphabets |18| . Thus strictly speaking, 
our result on DNJSCC does not subsume the result in |10|, though we believe our result on DNJSCC also holds for more general alphabets. 
Note that the proofs for the JSCMUD and JSCMMD problems do not critically rely on strongly typical sequences, and thus can be easily 
extended to the case to continuous alphabets. 
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coupling of the transmitted signals through the abstracted interference channel. 

The difficulty encountered in the JSCMMD problem is similar to that in [[301, where a simpler version 
of the problem, i.e., broadcasting a single Gaussian source with bandwidth mismatch, was considered. 
The proof approach in [[30ll is to introduce many additional auxiliary random variables not in the original 
problem (originally applied on the multiple description problem; see BTl ). then derive explicit outer bound 
by leveraging the Markov relation among them. Our approach for the JSCMMD problem is a generalization 
of this technique, however, the introduced auxiliary random variables serve another role in addition to 
providing an outer bound: their probability distribution is also used to generate codewords in the successive 
refinement source codes. By strategically utilizing the rate loss technique HH together with these auxiliary 
random variables, we are able to upper-bound the performance loss by the separation-based scheme. 

The DNJSCC problem and JSCMUD problem, for which source-channel separation is optimal, include 
many cases previously considered in the literature. For example, the joint source-channel coding problem 
of successive refinement coding with degraded decoder side information in [[32ll . Il33l . and the two-way 
successively refined joint source-channel coding problem considered in [[34ll are special cases of the DNJSCC 
problem; when the sources are independent, the problem of sending a pair of Gaussian sources over Gaussian 
broadcast channels ||35l and sending a pair of Gaussian sources over interference channels Il36ll , are special 
cases of the JSCMUD problem. 

Shannon's source-channel separation in the classical sense has several immediate implications, some of 
which may be taken individually as weaker notions of source-channel separation. For example, Tuncel 
[|371 discovered an "operational separation" where the source codebook and the channel codebook are 
largely designed independently, between which only certain codeword indices serve as their connection. 
This notion of operational separation is weaker than the classical source-channel separation, which was 
referred to as "information separation" in [[371 . To see this, observe that in the classical source-channel 
separation architecture, the digital codeword indices are the only information interface between the source 
coding component and the channel coding component, and particularly the source decoder takes the decoded 
digital codeword indices from the channel decoder as the sole information provider originating from the 
channel output, and ignores any other information from the channel. The decoder in the coding strategy 
in Il37l| is in fact a joint decoder which is impossible to be separated into two components. In our work, 
we shall use the notion of source-channel separation in the classical (informational separation) sense; this 
point regarding the meaning of source-channel separation will be revisited in later sections. 

C. Paper Organization 

The rest of this paper is organized as follows. In Section |Il] we discuss a few examples to provide some 
intuitions for the solutions. Necessary notation and definitions are given in Section Hill The main results 
and the proofs on DNJSCC, JSCMUD and JSCMMD are given in Sections HVl |V] and |VIl respectively. 
Section IVIII concludes the paper. 
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Fig. 1. Transmitting correlated sources on an interference network. 

II. Three Examples 

In this section we discuss three examples in the context of sending sources on interference channels 
to provide some intuitions for the optimality (or approximate optimality) of source-channel separation in 
DNJSCC, JSCMUD and JSCMMD. The main results of this work are built on these intuitions, and Sections 
HVl IVl and IVTl essentially make them more precise and rigorous. For simplicity, the channel bandwidth and 
source bandwidth are assumed to match. 

A. An Example for Distributed Network Joint Source-Channel Coding 

Consider the example depicted in Fig. [U where the discrete memoryless sources 5*1, S2 are correlated. 
Each discrete memoryless channel between a transmitter and a receiver is orthogonal to the other channels. 
More precisely, the channel from node i to node j has transition probability P(Yij\Xij), and the overall 
transition probability is given by 11(4 j) Both node 3 and node 4 require a lossy reconstruction 

of source 5*1, denoted as 5*1 3 and 5*1^4, respectively. Node 4 also requires a lossy reconstruction of source 
5*2, denoted as S'2,4. 

Though the capacity region of this special case of the interference channel is not difficult to establish, 
the rate-distortion region of the source coding problem is not known, and this problem is at least as 
difficult as the well known distributed source coding problem [[2T|. Thus the conventional proof approach 
of characterizing separately the rate-distortion region, channel capacity region and the joint source-channel 
coding achievable distortion region, and then making comparison, does not yield the desired separation 
result. Next, we illustrate through this example the methodology that enables us to prove the optimality of 
source-channel separation for the DNJSCC problem. 

Suppose there exists a length-n joint source-channel code that achieves the distortion triple 
(Di 3, Di 4, Z}2,4)- The key observation is the following simple fact. If we fix this joint source-channel 
code, then the channel input for any given channel, for example ^"3, can be viewed as a super (block) 
source, independent and identically distributed across blocks; see Fig. [21 Therefore, we can encode a length- 
n' sequence of such blocks using a "rate-distortion" code of rate per block slightly greater than /(Xj^g; 17^3), 
the codewords of which are generated using the distribution 17^3. It follows that with probability approaching 
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Fig. 2. Converting a joint source-channel code into a separation-based code on an individual orthogonal channel. The dashed lines give the 
partition between the source coding component and the channel coding component. 



one (as n! goes to infinity) we can find a ' codeword in tlie codebook that is jointly typical with a 
channel input sequence X™', {i.e., a length-n' vector of the super source samples), for sufficiently large 
v! . This lossy source code essentially simulates the channel output over r>! length-n blocks, and only the 
codeword index needs to be known at node 3 to reconstruct Y-^^ ■ As such, this digital codeword index 
can be simply sent across this channel using any good channel code, and the original joint source-channel 
decoding function can be performed on this simulated channel output, which eventually (asymptotically) 
achieves the same distortion as the original code. It is easy to see that since l[X~^^\ F^g) < raCi 3, where 
Ci 3 is the channel capacity of the channel between node 1 and node 3, this rate-distortion codeword index 
is expected to be reliably transmitted on this channel. Replacing all the channel outputs with such simulated 
outputs in this problem results in a new scheme. In this "new" coding scheme, the codeword indices of 
these "rate-distortion" codes are the only informational interface between the source coding component 
and the channel coding component, and this is a separation-based scheme which asymptotically achieves 
the same distortions (-Di,3, -Di 4, .02,4) originally achieved by the joint coding scheme. In other words, any 
distortions that are achievable by joint coding scheme can be achieved by a separation-based scheme. 

The above observation largely reflects the intuition behind the optimality proof of source-channel 
separation for the general DNJSCC problem, however, some technical details (besides the asymptotically 
diminishing quantities omitted in the above discussion) need to be addressed: the main difficulty is that 
when the network has relays or cycles, the super source argument given above does not apply since channel 
usage constraints prevent coding over long super-channel blocks directly. The proof given Section |IV] will 
resolve this difficulty through an intricate arrangement of channel simulation. 



B. Examples for Joint Source-Channel Multiple Unicast with Distortions 

Consider the problem depicted in Fig. [3l where the sources ^i, S2 and 5*3 are mutually independent; 
here the interference channel is more generally given by the transition probability PlY^i, I4IX1, X2), where 
Xi,X2 are the channel inputs by node 1 and node 2, respectively, and 13,^4 are the channel outputs at 
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Fig. 3. Transmitting mutually independent Si, 82,83 on an interference channel. 




Fig. 4. Transmitting mutually independent 5'i,S2,5'3 on an interference channel to multiple destinations, i.e., source 5*1 is required at both 
destination node 3 and node 4. 

node 3 and node 4, respectively. 

Since the capacity region of interference channel is unknown, it is impossible to explicitly characterize 
(in a single-letter manner) the achievable distortion region of the separation approach. However, suppose for 
the problem in Fig. [3l a distortion triple {Di, D2, -D3) is achievable using some joint source-channel code of 
length-n. The conventional rate-distortion theorem [[38l dictates that we must have /(S*"; S'f ) > nRi{Di), 
where Ri{D) is the rate-distortion function of source Si. The key is now the following simple fact, which is 
different from that discussed for the DNJSCC problem: if we fix this particular source-channel code, then the 
transition probability of P(S'", 5*2 , •S'J 15", S2, 5*3 ) can be viewed as that of an alternative super interference 
channel with three users. On this channel, the individual mutual information guarantee /(S"; 5") > nRi(Di) 
holds for i = 1, 2, 3, due to the conventional rate-distortion theorem. Thus intuitively, this super interference 
channel should be able to support a rate triple of {nRi{Di),nR2{D2),nR-i{D^)) per super-block channel 
use (or per n original channel uses) by a digital code. Now a good digital channel code on this super 
interference channel can be used to transmit a digital rate-distortion source code for the source 5*1, S2 
and S'i, respectively. This is indeed a separation-based scheme, and it can achieve the distortion pair 
[Di, D2, -D3) asymptotically. In other words, any achievable distortion triple {Di, D2, -D3) can be achieved 
by the separation approach. 

In order to show that the super-interference channel can indeed asymptotically support the rate triple 
{nRi(Di),nR2{D2),nRs(D^)), we essentially need to construct (random) codes over large super-channel 
blocks, and prove that the error probability can be made small, just as in conventional channels. The proof 
in Section |V] follows this approach and makes the above intuitive argument more rigorous. 

C. Examples for Joint Source-Channel Multiple Multicast with Distortions 

Consider the problem depicted in Fig. HI which is only slightly different from that in Fig. [3] in that 
source Si is to be reconstructed at both node 3 and node 4, denoted as §1^3 and Sia, respectively; the 



9 

reconstruction of source at node 3 is denotecj^ as 6*3,3 and the reconstruction of source S2 at node 
4 is denoted as 52,4. Taking a similar view as in the previous example, the abstracted channel now has 
transition probability P(S'"3, S'2 4, 6*3 315"", 5*2 , 5*3 ). However, the mutual information bounds by the 
conventional rate-distortion theorem cannot be directly used as in the previous case. A moment of thought 
should convince the readers that the broadcast nature of the marginal transition probability P{Si^, 5'"4|S'") 
is the culprit, and some additional coding component is needed. 

A natural separation architecture is to use a successive refinement ifTTIl source code to produce descriptions 
satisfying the distortion requirements for each source and couple it to a superposition broadcast code Il38l 
to deliver reliably these messages in the degraded message set [[T3l . More precisely, in the example of Fig. 
in assume without loss of generality that the distortion for source Si at node 3 is greater than that at node 
4. We shall use a successive refinement code for Si to produce messages (Wi^i,Wi^2) such that Wi^i is 
to be delivered to node 3 and both W^i,2) are to be delivered to node 4. Node 1 also produces a 

message ^3,1 to encode source S3, and node 2 produces a message 1^2,1 to encode source S2. The messages 
(Wi^i, Ws^i) need to be reliably transmitted to node 3, and the messages W^i 2, H^2,i) to node 4. 

Let us for the moment isolate source Si and focus on the super block broadcast channel P{Si^, 5'"4|S'j^) 
with the messages (VTi,!, VTi 2), since it is the main difficulty in generalizing the proof approach for 
JSCMUD. We will introduce an auxiliary random variable (in general more than one auxiliary random 
variable is needed), and show that this broadcast channel can support a certain rate pair for the degraded 
message set broadcast lfT3l . parametrized by this newly-introduced random variable. The same probability 
distribution of this auxiliary random variable is also used to construct successive refinement source code 
for 5*1. However, the afore-mentioned channel coding rates for degraded message set broadcast are in fact 
insufficient to support this successive refinement source code; nevertheless, the shortfall in the rates can 
be upper-bounded by comparing the broadcast channel coding rates and the successive refinement source 
coding rates, both of which are parametrized by this afore-introduced auxiliary random variable. This upper 
bound implies the approximate optimality of source-channel separation in JSCMMD. 

D. Remarks on the Separation Schemes 

One may argue that the schemes described above for the given examples are not really based on 
source-channel separation, because in the example for DNJSCC, the source codes embed in themselves 
the original joint source-channel codes, and thus these source codes are designed with the knowledge of the 
channel statistics. Similarly, in the examples for JSCMUD and JSCMMD, the channel codes also embed 
in themselves the original joint source-channel codes, and thus they are designed with the knowledge of 
the source statistics. Following this argument, it cannot be claimed that in these scenarios source-channel 
separation is optimal (or approximately optimal). This is, however, a rather subtle misconception. 

Indeed in a scheme based on source-channel separation, the source code should not rely on the channel 
statistics, and the channel code should not rely on the source statistics. In fact, the separated source coding 

'^The notation used here may see unnatural initially, however it will become clear that this notation is convenient when generalizing to more 
complex networks. 
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problem (i.e., charactering the rate-distortion region) is defined to be the original joint coding problem 
without the channel statistics, and the separated channel coding problem (i.e., characterizing the capacity 
region) is defined to be the original joint source channel coding problem without the source statistics; 
moreover, the optimality of source-channel separation can be defined (in a manner similar to Shannon's 
original work (HI) as when the source coding rate-distortion region coupled with the channel coding capacity 
region is the same as the joint coding achievable distortion regionH and this is precisely the kind of results 
we shall establish in this work. 

Without more precise mathematical definitions, it may be difficult to convince the readers that the 
arguments given in the previous sub-sections indeed show the optimality of source-channel separation. 
Nevertheless, here we shall provide an informal explanation in the context of the example in Fig. [3l Note 
that when each source in Fig. |3] is replaced by a message, we are left with an interference channel (with 
a common message between node 1 and node 2). The capacity region of this channel is unknown, and the 
question whether the capacity-achieving codes for this interference channel depends on any source statistics 
is not even relevant, because there exists no source in this new channel coding problem. In the previous 
discussion we have essentially shown that if for a set of sources (5*1, 5*2, 5*3) the distortion triple (Di, D2, D3) 
is achievable in the joint coding setting, then the rate triple (ri,r2,r3) = {Ri{Di), R2{D2), R^iD^,)) is 
also achievable on this interference channel in the pure channel coding setting. Though the channel code 
constructed in the previous sub-section relies on the source statistics, the statement that "the rate triple 
(ri,r2,r3) is achievable on this interference channel" does not depend on any source, and the newly 
constructed channel code can be understood as merely a tool to show this statement is true. If the rate 
triple {ri,r2,rs) is achievable on this interference channel, then in the joint coding setting, we can also use 
other capacity achieving channel codes (not related to the original source statistics) to send the individual 
rate-distortion codeword index for each source Si, which results in a separation scheme. Conversely, the 
channel code constructed in the previous section can also be used to send rate-distortion codes for any other 
sources, not necessarily the original sources, and this is also an apparent separation scheme. At this point, 
it is clear that we can indeed conclude source-channel separation is optimal in this setting. In Sections |IVl 
rvl and |VIl the formal proofs follow a similar line of argument in a more rigorous manner. 



III. Notation and Definitions 

In this section, we provide the notation and define the codes in consideration. The notation would become 
rather unwieldy if a unified framework were used for all the problems treated in this work. We therefore 
forgo this ambitious goal and define the problems separately; however, when convenient, the notation will be 
kept consistent among them. We focus on the problems with finite-alphabet discrete sources, finite-alphabet 
discrete channels and bounded distortion measures, unless noted otherwise explicitly. 

'Or equivalently: a distortion vector (matrix) is achievable if and only if the rate region of the source coding problem intersects with the 
capacity region of the channel coding problem. 
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A. Definitions for the Distributed Network Joint Source-Channel Coding Problem 

For this case, the network with a total of N nodes can be conveniently written as a directed graph 
Q = (V, £), where V = {1, 2, ... , A^} = is the set of nodes, and £ is the set of edges between any two 
nodes; from here on, for any integer M, we use Xm to denote the set {1,2,..., M}. Each edge e = (i, j) G £ 
is associated with a discrete memoryless channel, whose transition probability is given as P(Yi^j\Xij) with 
input alphabet A", ^ and output alphabet J^j .,; these channels are assumed to be synchronized. Each node i 
has a discrete memoryless source Si, distributed in the alphabet Si, and the collection of the sources are 
distributed according to the joint distribution P{Si, 5*2, ... , S^) at each time instance. For simplicity, we are 
inherently assuming these sources are synchronized, and thus the notation P{Si, 5*2, ... , Sn) is meaningful. 
A length-n vector of a source Si is written as S'^, and the t-th symbol in this vector is written as Si{t); i.e., 
S^ = {Si{l), Si{2), . . . , Si{n)). Similar notation is also used for other random variables. We use upper case 
for random variables, and lower case for their realizations. For any set S, we write the r-th order product 
set as iS*". 

For each source, a distortion measure is defined in a general manner as d : Si x Si ^ [0, oo) where Si 
is the reconstruction alphabet. A node j may be interested in only a subset of the sources {Si,i G Xtv}; 
notationally, we may write the set of sources that node j is interested in as J^. Next we define the class of 
codes being considered for the distributed network source coding problem, which are conventional block 
codes. 

Definition 1: An {m,n, {dkj, k G distributed network joint source-channel code on a joint source- 
channel network (V, G Xn}, P{Si, S2, ■ ■ ■ , SN),Y[(i j)£S -^O^hjl-^hj)) consists of the following 
components: 

• At each transmitter node i, for each j such that (z, j) G £, an encoding function for time instance t 

^^^r^ n ^ = l,2,...,n. (1) 

• At each receiver node j, for each source G a decoding function 

^f^^j : n ^M-x '^r ^ ^k- (2) 

The encoding and decoding functions induce the distortions 

4j = -5ZEc?(^fc(t),4jW), J = l,2,...,iV, and G 

t=i 

where Skj is the reconstruction of source Sk at node j. 

Here m is the source block length and n is the channel block length, which imply that there is a source- 
channel bandwidth mismatch factor of k, = n/m (channel uses per source sample). Note that if a node is 
not interested in a certain source, the distortion of the reconstruction at this node can simply be assumed 
to be large. Thus we can write a distortion matrix without loss of generality, whose element dkj is the 
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distortion associated with the reconstruction of source Sk at node j. Clearly, without loss of generalitjl^ 
we can let the element (ij j = and simply define dij = df^"^ for i ^ .^j, where df^^ is the distortion 
achievable at rate zero for source Si. With this in mind, the region of achievable distortion matrix can be 
defined as follows. 

Definition 2: A distortion matrix D is achievable for distributed network joint source-channel 
coding with bandwidth mismatch factor k on a joint source-channel network {V,£,{£^j,j E 
Xjv}, P(5'i, 5*2, . . . , S'tv), n^.^-jg^ P(Fjj|Xjj)), if for any e > and sufficiently large m, there exist an 
integer n < nm and an {m,n, {dkj, k E ^j}) distributed network joint source-channel code, such that 
di,j < Di j + e, i,j = 1,2, . . . , N. The collection of all such distortion matrices is the distributed network 
joint source-channel coding achievable distortion region, denoted as V^is- 

To discuss source-channel separation, it is important to clarify the individual source code and channel 
code used in the separation-based approach. To this end, we essentially need to define the pure source coding 
problem and the pure channel coding problem. The channel coding problem in the DNJSCC problem is 
simply the point-to-point channel capacity problem, and the codes used are naturally block channel codes. 
The source coding problem is more complex: intuitively speaking, it is the original problem when the 
noisy channels are replaced by noise-free bit-pipes. However, this statement needs to be made more precise 
because although the "bit-pipe" channels lead to a more intuitive understanding, in a network setting, we 
also need to specify the timing, and the causality relation of all the channel inputs and outputs; moreover, 
when the rates of the channels are not integers, the concept of "bit-pipe" channel is not easily defined, and 
this may cause further confusion. Our definition of the pure source coding problem given below naturally 
eliminates such confusions. It is important to note that the definition of the block source codes on this 
network needs to incorporate the interactive communication aspect carefully. 

Definition 3: An {m,l, {Lij, {i, j) E £},{dkj,k E J^}) distributed network source code with a total 
of / sessions on a source communication network (V, £, E X^}, P{Si, ... , S^)) consists of the 

following components: 

• At each (transmitter) node z, for each j such that (z,j) E 8, an encoding function for transmission 
session t = 1,2, ... ,1, 

(k,i)&S 

where Li j and L^/^ are positive integers. 

• At each receiver node j, for each source k E ^j, a. decoding function 

■■ n ^i. X '^r ^ ^k- (4) 



^Without loss of generality, we can always assume the minimum distortion for a given distortion measure is zero; see 1391 . 
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The encoding and decoding functions induce the distortions 

dk,, = -J2^d{Skit),SkAt))^ J = l,2,...,N, and k e 3^^, 
t=i 

where again Skj is the reconstruction of source Sk at node j. 

Definition 4: A rate-distortion-matrix tuple {{Rij, G £},D) is achievable on a source communi- 
cation network (V, S, G In}, P{Si, 52, ... , Sn)), if for any e > 0, there exists an integer /, such 
that for any sufficiently large m, there exists an (m, /, {Lij, G £}, {dkj, k G =^3^}) distributed network 
source code such that 

Ri,j + e> ^ log Li J, (ij) G £ and dij < Dij + e, i,j = l,2,...,N. (5) 

The collection of distortion matrix D for which the rate-distortion-matrix tuple {{Rij, G £},D) is 
achievable for a given rate vector {Rij, {i,j) G £} is denoteclzl as Vdis{Ri,j)- 

Roughly speaking, -^^log Lij is the rate of "noise-free bit-pipe channel" on edge e = per source 
symbol in each session. Each session has the same rate in this source coding problem, and the separation 
result we present will show that this additional requirement does not cause any loss of optimality. The above 
definition is given in the traditional block source coding framework, and there is no need to introduce the 
concept of "noise-free bit-pipe channels", which eliminates any possible confusions accompanied with it. In 
the above source code, there are a total of / sessions of coding which generate code indices at node j G In', 
at the end of each session, the index Wj^k G II^ ^ in this session becomes available at destination node k, 
and thus can be used by node k in the next coding session. In other words, the encoding functions need to 
observe the causality constraints on the session level on this network, by allowing the coding operation at 
a node to utilize only the source at this node and all previous session information from its incoming edges. 

With the above definitions, it is clear that we can combine the source codes together with the capacity- 
achieving channel codes for each channel on the original communication network. More precisely, we can 
define the achievable distortion region using such a separation approach as 

v*,, = v,URi,, = ^^Q,,), (6) 

where Cij is the channel capacity between node i and node j, and we sometimes also write it as with 
any e = (z, j) G It is straightforward to see that 

T^dis{Ri,j = f^C'ij) = T^dis{Ri,j)- (V) 

The separation approach we take is indeed in the classical sense, since not only the source codes and 
the channel codes are defined separately, the digital codeword indices are the only information interface 

^We have already used Vdis to denote the achievable distortion region for the joint coding problem, and here we slightly abuse the notation 
by using Vdis{Ri.j) to denote the distortion-rate function in this pure source coding problem. This does not cause any confusion since the 
concept of rates does not naturally exist in the joint coding problem. 
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between them during encoding and decoding. 

B. Definitions for Joint Source-Channel Multiple Unicast and Multiple Multicast with Distortions 

For this case, there are a total of M mutually independent discrete memoryless sources, denoted as Si, 
distributed in the alphabet Si according to some distribution P{Si), i = 1,2,. ..,M; note that the index i 
here is not related to the index of the node, which is the case for the distributed network source coding 
problem given in the last section. For notational simplicity, we shall assume all the sources are synchronized. 
The distortion measures are defined similarly as in the last subsection; here we do not allow the existence 
of multiple distortion measures for the same source, or distortion measures defined on the functions of 
more than one source. 

Let the number of nodes be A^. For the JSCMUD and JSCMMD problem, the graph theoretical notation is 
not suitable, since multiuser communication channels with broadcast and multiple access signal interactions 
are involved. For notational simplicity, we treat the overall communication network as a single channel, 
with inputs (Xi, X2, . . . , X^) over the alphabets XiX . . . x and outputs {Yi, Y2, . . . , Y^) over the 
alphabets x 3^2 x • • • x and transition probability given by P{Y-^\Xf); here Xi and Yi are the channel 
input and output at node i, respectively. This model naturally includes feedback, and we have inherently 
assumed the channels are synchronized by using this notation. Note that this general channel model includes 
the case that the overall network consists of many independent individual multiuser channels, and moreover, 
some of inputs and outputs can simply be set as constant if the particular node does not send or receive 
over the channel. For simplicity we shall also assume the channel is memoryless. 

Each source Si can be present at several nodes, and for each node j E X^, we denote the sources present 
at node j as =5^. The receiver demands are defined as follows: 

• Joint source-channel multiple unicast with distortion: each source is to be reconstructed (with or 
without distortion) at a single destination. Again denote for receiver node j the set of the sources it is 
interested in as ev^, then we have ^ fl = for any j k. 

• Joint source-channel multiple multicast with distortion: each source is to be reconstructed (with 
or without distortions) at multiple destinations, i.e., it is possible that ^ fl .5^ 7^ 0. 

Next we define the class of codes being considered in this work, which are again block codes. 
Definition 5: An (m, n, di, d2, ■ ■ ■ , du) JSCMUD code on a joint source-channel communication network 
({^j, J G Xjv}, J e Xtv}, n£i P{Si), P{Y(^\X^)) consists of the following components: 

• At each transmitter node j, an encoding function for (time) index t 

-WSTxyr^^X,, t = l,2,...,n. (8) 

• At each receiver node j, for each source k E J^j, a. decoding function 

i^kj ■.yjxHsr^ 5r. (9) 
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The encoding and decoding function induces the distortion 

^ m 

4 = - V ^diSkit), Skit)), k = l,2,...,M, 
m ^-^ 

t=i 

where Sk{t) is the reconstruction of source Sk at a node j such that k E ^j. 

Definition 6: A distortion vector (Di, D2, . . . , Dm) is achievable for JSCMUD on a joint source- 
channel communication network j G Tjy}, G Xjy}, nf=i P{Si), P(Y-^\Xl^)) with a bandwidth 
mismatch factor if for any e > and sufficiently large m, there exist an integer n < Km and an 
(m, n, di, d2, . . . , cIm) JSCMUD code, such that di < Di + e, i = 1,2, ... , M. The collection of all such 
distortion vectors is the achievable JSCMUD distortion region, denoted as Vuni- 

Next we need to define the source codes and the channel codes extracted for the separation-based 
approach. For the JSCMUD problem, the source codes are conventional lossy source codes. In the channel 
coding problem we consider, each source Si is replaced with a message Wi of cardinality Lj with a uniform 
distribution; moreover, these messages are mutually independent. The precise channel code definition is as 
follows. 

Definition 7: An {n, Li, L2, ■ . . , Lm, Perr) multiple unicast channel code on a communication network 
{{yj,j E In}, {^jii G Xn},P{Y^\X^)) consists of the following components: 

• At each transmitter node j, an encoding function for (time) index t 

4>f \{XlX y]-^ ^X^, t = \,2,..., n. (10) 

• At each receiver node j, for each message Wk where /c G 5^, a decoding function 

^k,--yi^ W^l^^xl,. (11) 

Denote the decoded message as Wi at node j where i E Tj, the encoding and decoding functions induce 
the average decoding error probability 

M 

Perr = Pr{[jW,y^m). (12) 

i=l 

Definition 8: A rate vector {Ri, R2, . . . , Rm) is achievable for multiple unicast channel coding on a 
communication network E Xiy},{^j,j E Xjy}, P{Y^\Xf)), if for any e > and sufficiently 

large n, there exists an (n, Li, L2, . . . , Lm, e) multiple unicast channel code, such that Ri < ^ log Li + e, 
i = 1,2, .. . ,M. The collection of such achievable rate vectors is the achievable capacity region of the 
network, denoted as Cuni- 

Using conventional rate-distortion codes on each source and then combining it with the above defined 
multiple unicast channel codes as the coding scheme, an achievable distortion region is immediate, which 
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will be denoted as V^^j^. More precisely, we can write 

K^i = U {{D,, D2,..., Dm) : A > D,{KRi),z = 1, 2, . . . , M}, (13) 

where -Dj(-) is the distortion-rate function of the source Si. 

In the case of JSCMMD, a source is to be reconstructed with possibly different distortions at multiple 
destinations. The JSCMMD codes are defined in the same manner as in the case of JSCMUD, and thus the 
detailed definitions are omitted here. The achievable distortion matrix and the achievable distortion region 
Vmui can also be defined accordingly. 

The source-channel separation scheme for JSCMMD is slightly more involved. Consider first source Si, 
and assume it is to be reconstructed in a lossy manner at nodes in the set ^i = {j : i E ^j} . The source 
codes we shall consider are successive refinement codes [[TT|. and source Si is encoded in stages, 
where the operator | ■ | denotes the cardinality of a set. For the channel codes in the separation approach, we 
shall consider a generalized version of the two-user degraded message set problem [[T3l . More precisely, 
in the given source communication network, let us fix an order Oj for the elements in the set for each 
i = 1,2, ... , M. The source Si is replaced with a total of messages, denoted as Wij, whose rate is 
Ri,Oi{j)^ J = 1) 2, ... , \^i\, where Oj(j) is the j-th element in the order O/. The A;-th node in this given order 
Oi is required to reconstruct the first k messages, Wij, j = 1,2, . . . ,k. We can now define the achievable 
capacity region CmuiiOi, O2, . . . , Om) for this generalized degraded message set problem, which depends 
on the set of orders 6 = (Oi, O2, ■ ■ ■ , Om)', see the JSCMMD example in Section HTbI where =2i = {3, 4} 
and the specific order discussed is Oi = (3,4). 

Clearly, the degraded message set scenario naturally sets the stage for the successive refinement source 
codes, and by combining these two components, we arrive at an achievable distortion region using the 
separation appraoch for a given set of orders O. We shall denote this achievable region as T^muA^)- 

IV. Optimality of Separation for Distributed Network Joint Source-Channel Coding 

Our first main result is the following theorem, which formally states that the joint coding achievable 
distortion region is the same as separation coding achievable distortion region in the distributed network 
joint source-channel coding problem. 

Theorem 1: For any discrete and memoryless distributed network joint source-channel coding problem 
(with orthogonal communication channels), we have Vdis = T^dis- 

The following uniform Markov lemma is needed in the proof of this theorem, which can be found in 
pT | - ll43l . This lemma is alternative version of the well known Markov lemma [Ell, and it has been used 
more recently in p4|. We rewrite it below using notation more convenient to ucl- 

Lemma 1: Let X ^ Y ^ Z he a Markov string in finite alphabets. For any fixed strongly jointly typical 
sequence pair (x", y"), let be chosen uniformly at random from the set which consists of all sequences 

'^The lemma given in 1143 1 makes the asymptotically small quantities in the definition of strongly jointly typical sequence more explicit, 
which are however of little consequence here. 
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that are strongly typical with y". Let Q{-) be the probability measure induced by this random choice. Then 
lim„^oo y"? 2'") are not strongly jointly typical) = 0, and the convergence is uniform over the set 

of strongly jointly typical sequence pairs. 

Proof of Theorem [7} 

We first show that Vdis ^ T^*dis- Consider an /-session length-m distributed network source code with 
rates {-Rjj, (i, j) G £}. With the bandwidth mismatch factor k, we have a total of nm channel uses, and we 
shall partition them into / channel sessions, each with [^J channel uses; for simplicity, we shall assume 
^ is an integer. Thus the channel on edge j) in each session can support a message of cardinality up to 
[exp(^Cj j)J, with vanishing error probability, where we assume natural logarithm for mutual information 
definition. Each session of the source code has a message output of cardinality Lij = [exp(y-Ri j)J . Thus 
omitting the rounding issue, as long as exp(yi?j j) < exp(^Cij), i.e., Rij < nCij, we can use the digital 
channel codes to transmit the source codes reliably (with vanishing error probability). Since the sources 
are discrete and thus finite, it follows that indeed, 

'Dd^s2 U V{{R,,,,{tJ)e£}). (14) 

Because the achievable distortion region Vdis is a closed set and the distortion-matrix-rate function V{Ri j) 
is continuous, the condition i?j j < kCj j can be replaced by Ri j < nCij and the rounding issues can be 
safely ignored, resulting in the achievable region V'^-g. Thus we have Vdis ^ T^dis- 

Next we focus on the direction Vdis ^ T^dis- achievable distortion matrix D, there exists a sequence 

of distributed network joint source-channel codes approaching it. In other words, for any e > 0, there exists 
an (m, n, {D^.j + e, k G <^ }) distributed network joint source-channel code (see Definition [T] and Definition 
|2]), where n < Km. It is instructive to first examine the joint probability distribution induced by this particular 
code. Let X^(t) and Y£{t) denote the collection of channel inputs and outputs at time t; similarly we use S*™ 
to denote the collection of all the source vectors in the network. At t = 1, X£{1) is a function of 5™, i-e., 
X£-(l) = 4'£'\Sy) in the notation of ©; ^^{l) generates l£-(l) via \£\ orthogonal channels; note that we 
have O Xe{l) ^ Ye{l) form a Markov chain and P(F^:(1)|X£(1)) = Uii,j)ee P(XiA'^)\^iA'^)) since 
the channels are orthogonal. At t = 2, Xs{2) is a function of and Y^il), i.e., X^:(2) = (f)f\S^, >£(!)); 
X£{2) further generates Y£(2) via \£\ orthogonal channels. Successively, at time t, we have 

. Condition one: X^it) = (f)f{S^,Y^-'^y, 

. Condition two: {S^,X^f\Y^-^) ^ Xsit) ^ Ygit) form a Markov chain, and P(Ff (t)|X^(t)) = 

We next show that if a distortion matrix D is achievable in the joint coding problem given by Definition 
[Hand Definition [21 which is denoted as Pj, then the rate distortion matrix pair {{nCe}, D) is also achievable 
in the pure source coding problem given by Definition [3] and Definition [4l which is denoted as P^. For this 
purpose, we shall construct an n-session distributed network source code for P^ that operates on a source 
sequence of length mn' . Definition [3] and Definition [4] are rather crucial in the discussion below, and the 



18 



readers are encouraged to familiarize with them before proceeding. 

We first partition the source sequence S*™"', i = 1,2, . . . , N, into n' disjoint block components, each of 
length m. The v-th block component of S*™"' is written as Sf^{v), i.e., 

ST{v) ^ (^Si{{v - l)m + l),S,{{v-l)m + 2),..., S.,{vm)^ . 

To make this partition explicit, S*™"' is written in the sequel as S'™'^"^ 

For each G £ and each session t = l,2,...,n, a source coding codebook of size 

exp(r2'(/(Xj ,,(t); Yij{t)) + 5)) is generated by choosing from the strongly typical set of the random variable 
Fjj (t) uniformly at random with replacement. This codebook is revealed to both the encoder and decoder 
on edge (z, j) in the problem P^. 

Now consider encoding for session t = 1 at any given edge (i, j) G 8. We first apply the original joint 
source channel encoding function (pf^^ on each block component S^{v), v = 1,2, ... , n'; i.e., Xij{l, (v)) = 
4>[]j {sf^ (v)) . The outputs are then concatenated to produce a length-n' vector X-j\l), 

= (2)), . . . , (n'))). (15) 

For each G S, if xjj^(l) is strongly typical, we then find a codeword y^j\l) in C(jj)^i such that 

xjj^(l) and y-j^(l) are strongly jointly typical with respect to P {Xij{l) ,Yij{l)); if there does not exist 
such a codeword, an error is declared. Denote the index of this chosen ^^^"^(1) codeword in i as 
Wij{l); the f-th location in the vector ^^^"^(1) is written as yij{l, {v)). For notational simplicity, we shall 
also write 

yljiv) = iy^,ja,{v)),y.,i2,{v)),...,y,j{l,{v))). (16) 
The new encoding functions 0-^ for the distributed network source coding problem (i.e., Pg) are given by 

0g (C'^'^'^) = ^,,(1), {^,J)ES. (17) 

We continue to construct the new code for the source coding problem P^. In the t-th session, for any 
given edge G £, the original joint source-channel encoding function (pfj is applied, and the outputs 
are concatenated (see Fig. |5]), i.e., 

4;'^W =(0S(^r(l),{2/M (l)'(^'O e S}),<f>^{sT{2),{y'^^{2),{k,z) G S}), 

■■■,<l^^{sT{n')Ayl:■{n')Ak,^)eS})y (18) 

Then for any G S, if xf^j\t) is strongly typical, find a codeword ytj^t) in ^^'^^ that xlj\t) 

and yl^j\t) axe strongly jointly typical with respect to P{Xij{t),Yij{t)); if there does not exist such a 
codeword, an error is declared. The index of the chosen codeword ylj\t) in C(jj),t is denoted as Wij(t), 
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and thus the new encoding functions </)•*] for the distributed network source coding problem (i.e., Pg) are 
given by 



After n sessions of encoding, at node j G V, we apply the originally joint source-channel decoding 
function tpkj to reconstruct the v-th block component of source s™'^"\ i.e., 

SkAv) = ^kjisfiv), {y:/v), (z, j) e S}), v = l,2,...,n', (20) 

which are then concatenated to form s^'j"^ \ i.e. the length-mn' reconstruction of source k at node j. Thus 
the new decoding functions ifj^j for the distributed network source coding problem {i.e., P^) are given by 

j) e £}) = Cf \ k G (21) 

There are three kinds of error events in session-t: 

• (sy \ x^^" \ ?/^~^'^" ^) are not strongly jointly typical with respect to P(S'y , X^, this event 
is denoted as -E^^^ Note that if e[^^ does not occur, then for all G £, the sequence x-j^(t) is 
strongly typical. 

• For an edge (i,j) G £, given x'f^j\t) is strongly typical, there does not exist any codeword in 

such that it is strongly jointly typical with xf'j\t), with respect to P{Xij{t),Yij{t)); this event is 

(2) 

denoted as E^^^jy 

m,{n') t,(n')' >-l,(n'>^ . ^ (n'> 



• I Si; , X 



^ ) and (t) are not strongly jointly typical with respect to P{S^, Xg, Yg); this 



(3) 

event is denoted as El . 
The overall error event is given as 

E„,=[J(B«U U E;|,,UBf) 



= U(^S n Ei^^) u (eP n U 41^,) u (i^f ) u U 41.) n i^f ^) , (22) 

where we used 5* to denote the complement of set S, and E^^^ is the event that the sequences ^ are 
not strongly jointly typical. 

By the union bound, we have 



n n , \ 11 , 

Pr(£„.) < E Pr(£S n £.'") + ^ U 41.) ) + E ^A^f^ ^ U 

t=i t=i V (i,j)e£ / 1 ^ {ij)6 



p{2) p, p(3) 



(23) 

Next we show that Pr (£"„/) — > as ra' -> oo. 
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It is clear that Pr(£'o^'') — )■ by the basic properties of the strongly jointly typi cal se quences ( [l38l . pp. 
358-362). Si nce x f'\l) is a deterministic function of it is clear that Pr(Ef ^ n ^) 0, and 

similarly Vx{Ef\ n e[^^) ^ for t = 2, 3, . . . , n. 

For the second summation in (|23l) . again by the union bound 




Since -E^^^ implies that Xj-j^(t) is strongly typical, it is clear that Pr(i?i^^'* ft Ef'^- ■■^) — )• for any t and 
(z,j) G by the properties of the strongly typical sequences ( Il38l . Lemma 13.6.2), and the fact that the 
number of codewords in C(ij),f is exp(n'(/(Xjj(t); Yij{t)) + 6)). 

To bound the third summation in ( |23] ). let us fix an arbitrary order for the edges in the set £, and write it 

as ei, 62, ... , e|^|. Define Ej:^^ as the event that (s™ \ x^^" \ y^"^'^" ^) and (yi? ^ (t) , yi" ^ (t) , • • • , ytk\t)) 
are not strongly jointly typical. We can then rewrite 

£1"U U ^S.) n = U J^PU U <„Ui5;t. n Eg A (j j;(3)._ (25) 

{j,i)g£- fe=l (i,j)g£- fc=l 

where = 0. To bound Pr(£^S*), observe that 

is a Markov string. Invoking Lemma [U gives that Vx{Efl*) 0, for any t and k, as ra' — 7- oo. 

There are a total of n terms in the first summation of (l23l) . a total of n|£^| terms in the second summation, 
and a total of n\S\ terms in the third summation. Since n and \S\ are fixed in the above construction, and 
each term can be made arbitrarily small by making n' sufficiently large, it is seen that Pr(£'„/) — as 
n' — !■ oo. This implies that the sequences (sy \ x^ ''" \ y^'^" ^) are strongly jointly typical with respect to 
the original distribution P{Sy, Xp, Y^) with probability approaching one as n' — )• oo. This further implies 
that s™'^" ^ and s™^^" ^ are strongly jointly typical with respect to P{S]^, ^Tj)' ^^'^ the new code induces a 
distortion Dkj + e + 6', where 5' — )• as n' — t- oo. 

It remains to analyze the rates of this digital scheme, i.e., the cardinalities of indices delivered during 
each session for the problem P^. It is clear that for each link e = E £, for each session t, we have 

/(Xe(t);Fe(t)) <Ce, t=l,2,...,n, (27) 

where Ce is the capacity of the channel on edge e, due to the conventional channel coding theorem. Thus 
it is seen that the cardinality of above digital codes for each session associated with any given link e is 
bounded as 



exp(n'(/(Xe(t); Y,{t)) + 6)) < exp{n'{Ce + 5)), t = 1, 2, . . . , n. 



(28) 
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Fig. 5. Coding operation of Ps in session t + 1 for node i with an incoming link (k, i) and an outgoing link {i, j). Each narrow horizontal 
box represents a vector; the vectors yk,i{'f)'s and yl^^{v)'s are shaded partially because at this point, the later parts have not been generated. 
Each component of the lossy encoder output, i.e., yi,j{t + 1, (v)), is appended to the existing y\.j{v) to form yl^^{v). 



It follows that the following rate is achievable in the problem 

Re = 7 logexp(n'(Ce + 5)) < K{Ce + 6), (29) 

mn 

according to Definition HI Thus we have shown that this newly constructed code can operate at the rate- 
distortion-matrix tuple {{Re = K,(Ce + S)} , D + e + S') for the problem P^, where 5 and 6' can be made 
arbitrarily small by letting — t- oo. Since the achievable rate-distortion-matrix region for is a closed set, 
the asymptotically small terms 5 and 5' can be safely ignored, and it follows that the rate-distortion-matrix 
tuple {{Re = nCe}, -D + e) is achievable in the problem P^. Now since the distortion matrix D is achievable 
in the joint coding problem Pj, for any e > 0, there exists an (m, n, {Dkj + e, k E J^}) joint source-channel 
code, where n < nm. It follows that ({-Re = K.Ce],D + e) is an achievable rate-distortion-matrix tuple in 
the source coding problem P^ for any e > 0, and again since the achievable rate-distortion-matrix region for 
Fa is a closed set, it is clear that {{Re = i^Ce], D) is indeed achievable for P^. The proof can be completed 
for Vdis C V*^ia by applying ©. ■ 

Remark: In the proof, the original joint source-channel code length (m, n) and the number of super 
blocks n' need to be controlled carefully. To arrive at the conclusion, we essentially need to drive both of 
them to infinity, but at different rates. This consideration also applies in the proof for the JSCMUD problem 
in the next section. 
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V. Optimality of Separation for Joint Source-Channel Multiple Unicast with 

Distortion 

The main result of this section is the following theorem, which formally states that the joint coding 
achievable distortion region is the same as separation coding achievable distortion region in the JSCMUD 
problem. 

Theorem 2: For discrete and memoryless joint source-channel multiple unicast with distortion, we have 

Proof of Theorem El' The direction Vuni ^ T^lmi rather obvious, and thus we focus on the other 
direction Vuni ^ T^uni- ^'^Y achievable distortion vector [Di, D2, . . . , Dm), there exists a sequence of 
JSCMUD codes to approach it. In other words, for any e > 0, there exists an (m, n, Di + e, D2 + e, . . . , Dm + 
e) JSCMUD code, where n < nm (see Definition [5] and IQ). 

The sources and the above given block code induce a joint distribution 

(30) 



np/ QmN p / nm nm nm nm nm nm. \ 

Wi J ■ I '-'1 5 '-'2 ! • • • ! '-'AI 1 '-'2 ) • • • ) '-'M I 



and we view the second term as the transition probability of a block-level interference channel, which 
has input alphabets x 5™ x . . . x S'^^, and output alphabets x x . . . x 5]^. Moreover, by the 
conventional rate-distortion theorem ll38l . it is clear that 



I{Sr;Sr)>mRi{Di + e), i = l,2,...,M, (31) 

where Ri{ ) is the rate-distortion function for source Si. This super interference channel operates in the same 
manner as a memoryless interference channel, however it operates on a block level {S"\ S*™, . . . , SJ}) — )■ 
{S^, S^, . . . , Sm), instead of on a single time instance level on the original channel (Xi, X2, ■ ■ ■ , Xn) — )■ 
(Fi,F2,-- - ,1V). 

Next we show that if a distortion vector (Di, D2, ■ ■ ■ , D^f) is achievable on the joint coding prob- 
lem defined in Definition [5] and Definition [6l which we denoted again as Pj, then the rate vector 
{Ri{Di), Ri(Di), . . . , Rm{Dm)) is achievable on the pure channel coding problem defined in Definition |7] 
and Definition [8l which we denoted as Pc. For this purpose, we shall construct a multiple unicast channel 
code for Pc using the afore-mentioned (m, ra, _Di + e, D2 + e, • • • , Dm + e) JSCMUD code for Fj. 

The coding scheme for Pc can be formally described as follows. For each source Si, exp{mn' {Ri{Di+ e) — 
5)) codewords of length- (mn') are generated independently, accordingly to the mn'-th product distribution 
of P{Si); denote this codebook as Q. The codebooks are revealed to all the nodes. 

To encode for Pc, given a message Wi, we shall choose the Wi-th codeword s™"'(w;j) in the Cj codebook 
generated above. Each codeword is partitioned into n' blocks of equal length, and let us denote the v- 
th block as (f )); to emphasize this partition, we also write as s™'^"^(wj). For a fixed 

V, the blocks {s^{wi, (v)), s'!p{w2, {v)), s^{wm, i^om the chosen codewords at all the nodes 
can be viewed as the length-m source vectors in the original JSCMUD problem, and thus the original 
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(m, n, Di + e, D2 + e, . . . , Dm + e) JSCMUD encoding and decoding functions can be used on them. 
This results in a set of reconstruction sequences {s^'{v), ^^(f ), . . . , s^(t>)). At the end of n' blocks, we 
concatenate the reconstruction for each source block as s^"'' = s^'^" ^ = 5'™(2), . . . , S^{n')). 

Mathematically, let the original joint source-channel encoding and decoding function at node j be (f)J\ 
and ^jJkJ, respectively. Similarly as the notation of s^{wi, {v)), the v-th length-n block of is written as 
y^{v), and the first t symbols of the block y^{v) is written as Then the new channel code encoding 

function 0^* ^ is given by 

f = 1, 2, . . . , n', t = 1, 2, . . . , n. (32) 

The reconstructions are simply 

sj{v) = iJk,,{{sT{w,, {v)),i E .y,},yj{v)y k E (33) 

We proceed to construct the multiple unicast channel code decoder (for problem Pc). At node j, for 
which k E 5^j, find a unique codeword in the codebook Cfe such that it is (weakly) jointly typical ll38l with 
jmn' according to the distribution ^(5™, 5™), i.e., the marginal from (l30l) . If there is a unique codeword, 
then the corresponding message w\ is declared; otherwise an error is declared. 

There are three kinds of errors in this new scheme: 

• The sequences (s™'''" ^(wi), s™'*^" ^(^2), • • • , s^/" ^(wAf)) are not jointly typical with respect to (|30l) ; 
denote this event as E^^\ 

rp. / m.ln') / \ mAn') / \ rriAn') / \ ^m,(n') ^m,(n') ^m.{n') \ * • • ti ^ 

. The sequences ^ '{wi},S2" '(uja), • • • , {wm),s^" ^s^" Sj^f ') are not jomtly typ- 

ical with respect to (l30l) : denote this event as E^'^\ 

• For a given message Wi, there is more than one codeword in Ci that is jointly typical with g^'^" \wi), 
with respect to the marginal of (|30l) ; denote this event as -E^^^ 

The overall error probability can be bounded as 

M 

Pr{En') < Pr(E(i)) + Pr(EW n E^^)) + ^ Pr(E(2) n ^), (34) 

i=l 

where the inequality is by the union bound. 

Since all the codewords are generated according to P(S'j)'s independently, by the basic properties 
of the jointly typical sequences ( ll38l . Theorem 14.2.1), Pr(£'*^^^) — )■ as ra' — )■ 00. This implies that 
the reconstructions {s™'^" \ i = 1,2,..., M} are jointly typical with {s™'^" \'Wi), i = 1,2,..., M} with 
probability approaching one, i.e., Fr{EW n P^^)) ^ g as n' ^ 00. It follows that Pr(P(2) p ^(^)^ _^ q 
as n' — > 00, by (|3T1) and the basic property of the jointly typical sequences ( [[38l . Theorem 14.2.1 and 
Theorem 14.2.2), and the fact that the number of codewords in Ci is exp{mn' {Ri{Di + e) — 6)). Since there 
are a total of M + 2 terms in (l34l) . it is clear that Pr(_E'„/) — )■ as ?i' — )• 00. 
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By making n' large and thus 5 small, it is clear that the rate tuple {Ri{Di + e), i?2(-D2 + e), . . . , Rm{Dm + 
e)) e Cuni for any fixed e > 0. Since the rate-distortion functions -Ri(-)'s are continuous, and the capacity 
region Cuni is a closed set, it follows that i?2(-D2), • • • , -RAf(-Dj\/)) G Cuni- Therefore we can 

conclude that {Di,D2, . . . ,Dm) e P^^j, by the definition of given in (fT3l) . and thus Vuni ^ l^uni- 
This completes the proof. ■ 

Remark: When a source Si is present at multiple nodes, these nodes in fact cooperate to communicate 
to the destination. A source Si present at multiple nodes in the joint coding problem implies a common 
message Wi available also at these nodes in the pure channel coding problem, and thus these nodes can 
cooperate to transmit this common message. This cooperation is implicit in the original JSCMUD code, 
but in the induced channel code it does not appear explicitly. Note that since our proof given above only 
relies on weak typicality, it can be extended straightforwardly to problems with more general (continuous) 
alphabets. 



VI. Approximate Optimality of Separation for Joint Source-Channel Multiple 

Multicast with Distortion 

In this section we examine the third scenario considered in this paper, i.e., the case where there could 
be multiple receivers interested in the same source, but at different distortion levels. We limit ourselves to 
a set of distortion measures often referred to as the "difference" distortion measures, since its properties 
play an important role in the proof. More precisely, the distortion measure d{-, •) is given in the form of 
X = X, where X is an Abelian group with a proper addition operation; furthermore, 

d{x, x) = d{x — x), (35) 

i.e., the distortion mapping only depends on the difference between the two variables. 

In order to present our results, some additional definitions are quoted directly from lfT4l . For a random 
variable in the alphabet X, the capacity of the additive noise channel X X + N where X is also in 
the alphabet X, under a d{-) constraint is defined as 

C{D,N)= sup I{X;X + N) (36) 

X:X±N,Ed(X)<D 

The addition + is defined on the Abelian group of X, and it can be real addition, modulo addition or finite 
field addition where appropriate; ± here stands for independence. The minimax (or worst noise) capacity 
is defined as 

CxiD)= inf CiD,N). (37) 

Ed{N)<D 

Cx{D) can be interpreted as the capacity at equilibrium in a mutual information jammer game, played over 
an additive-noise channel, in which both the expected noise and expected input are limited to within D in 
terms of d{-). Though for general difference distortion measures, the quantity Cx{D) is a function of D, 
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for some specific cases, considerable simplification is possible. For example, when the distortion is mean 
squared error, Cx{D) is always 1/2 bit lfT4l . 

Our approximation result is in a genie-aided form, where additional communication links with bounded 
capacities are provided by a genie. We will show that a separation-based approach using the original 
communication network together with the additional genie-provided communication network can achieve 
any distortion matrix D that is achievable in the original communication network with arbitrary joint coding 
schemes. To quantify this genie-provided communication network, recall that the z-th row of an achievable 
distortion matrix D E V^ui corresponds to the distortions achieved for source Si at the destinations in the 
set It will become clear in the proof that if the reconstructions of a source Si at multiple destinations 
in the set ^i are required to be at the same distortion level a priori, then these destinations can be viewed 
as a single super-destination, and the problem can be reduced to a simpler one. Therefore, without loss of 
generality we shall assume the reconstructions of a source at multiple destinations are at different distortion 
levels. 

The decreasing sequence of distortions for the elements on the z-th row thus specifies an order Oj of the set 
^i, let Oi{j) be the j-th element in the set of according to the order Oj. We require this genie-provided 
network to support degraded message set broadcast from source Si to the nodes in the set ^i for each i 
where > 1. In other words, for each source Si such that > 1, for each j E there is a common 
link of capacity Ri^o^u) per source sample§ from Si to all the nodes Oj(j), Oj(j + 1 ),..., Oj(|,^j|). These 
rate entries are collected and written together as the rate matrix R. Consider adding this genie-provided 
communication network on top of the original source communication network, and denote the achievable 
distortion using a separation approach of successive refinement coupled with superposition channel code 
on this joint communication network as T)^^i{0, R). 

Example: Consider the example given in Fig. |4l The sets ^/s are 

^i = {3,4}, ^2 = {4}, ^3 = {3}. (38) 
The orders when the distortion of Si 3 is larger than Si 4 are 

Oi = (3,4), 02 = (4), 03 = (3). (39) 
The rate matrix of the genie-provided communication network has the form 





i?l,3 -Rl,4 




R = 





(40) 









where O at row-i and column-j means that the genie does not need to provide an additional communication 
capability from source Si to node j, thus Ri j is not defined. The new network consisting of the additional 

'if Si is present at more than one node, i.e., \{k : i £ •S^k}\ > 1, then Ri^OiU) should be the sum rate per source sample of such common 
links from each of the node in {k : i £ .S^k} to all the nodes Oi(j), Oi{j + 1), • • • , Oi(|=Si|). 
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Fig. 6. The Example in Fig. |4] with the additional genie links, which are in drawn in dashed lines. The region T)^„i{0,R) is the achievable 
distortion region using separation-based scheme on this joint network. 



genie communication network on top of the original source communication network is given in Fig. [6l 
The following theorem is our first result on general network multicast. 

Theorem 3: Let D be an achievable distortion matrix, for which O is the corresponding orders induced 
by D. For any random variable Uij in the alphabet Xi, j = 1,2, . . . such that 

Ui,o^{\^i\) = Vi,Oi{\Si\) (41) 
Ui,o,{j) = Ui^o,{j+i) + ^i,o,o-), (42) 

where Oi{j) is the j-th node index in the set of according to the order Oj, and V^j's are mutually 

independent such that E.d{Ui^Oi{j)) ^ Di^OiU), genie communication network support the rate matrix 
whose elements are 

''^'^^^ \ O otherwise 

Then we have D e V*^^i{d, R*). 

Remark: It is clear that Ud'^m««(<5) ^ Vm.ui ^ [Jo'^muiiO, R*) in the above theorem. 

This theorem in fact provides more than one approximation, one for each set of Vij random variables, 
resulting in a rather powerful bounding tool. To find the tightest bound, we can optimize over these random 
variables under certain constraints depending on the achievable distortion matrix D. 

Under certain distortion measures, significant simplifications can be made. The quadratic distortion 
measure is an important special case of difference distortion measures since it is used in many practical 
systemil*]. The main result of this section is given for this case, which shows that a separation-based scheme 
is approximately optimal, universally across all distortion values, for the quadratic distortion measure. Note 
that the sources need not be Gaussian. 

Theorem 4: Let D be an achievable distortion matrix, for which O is the corresponding orders induced 

'"As mentioned earlier, our proofs for the JSCMUD and JSCMMD problems rely only on weakly typicality instead of strong typicality, thus 
the result can be easily extended to the continuous sources and channels with continuous alphabets and unbounded distortion measures under 
the technical condition that for each source Si, for all letters Si G Si, Ed{Si — Si) < oo. This condition assures that the asymptotically small 
decoding error probability does not cause significant change in the distortion behavior. This condition is not uncommon in source coding, see 
e.g., 1181 : we shall refer to it as "bounded expected distortion" condition. 
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by D. Let the sources S'j's satisfy the condition that for all letters Si G Si, ¥.{Si — SiY < oo. Let the genie 
communication network support the rate matrix whose elements are 

f 1/2 bit J < \^A, > 1 
[ O otherwise 

We have D G T^muii^^ ^*) under the mean squared error distortion measure. 

In the simplest case where a single node broadcasts a Gaussian source to a set of receivers, this result 
essentially reduces to Corollary 1 given in ||30l for the Gaussian source. The intuitive translation of the 
above result is that when a genie helps the separation-based scheme by providing half a bit information 
for each receiver, and at the same time, all the receivers with better quality reconstructions receive this 
information for free, then the genie-aided separation-based scheme is as good as the optimal ones. 

Theorem m implies that the total aggregate throughpujj^ provided by the genie is | J2ieQ 1^*1(1=^^1 + 1)' 
where Q = {i : > 1} HZm- This depends on the demand structure and grows large in networks with 
many sources and when most sources are required to be reconstructed at many destinations. However, for 
any fixed network, the approximation in Theorem |4] holds regardless of the quality of the channel. As such, 
this result is more useful in the high resolution regime for large networks, when the genie-network becomes 
negligible compared to the original communication network. In networks with only a few sources, or when 
most sources are to be reconstructed at only a few destinations, the genie network aggregated throughput 
is not large, and the approximation is usually sufficiently accurate. 

Next, we focus on the proof of Theorem [3l since Theorem |4] can be directly obtained by using Gaussian 
auxiliary random variables Vs in Theorem |3l 

Proof of Theorem\3} To simplify the notation, let us first consider a single source Sf, we can assume 
for the time-being that the joint source-channel encoding procedure is still performed on other sources, 
and we shall return to this point at the end of the proof. Without loss of generality, we can assume the 
destination nodes of source Si are 1,2, . . . , K, and moreover the distortions, which are achieved by this 
given source-channel joint code, are ordered as Di i > Di 2 > ■ ■ > Di ^. To simplify notation, we shall 
omit the index i in Si for the time-being. This will also imply that instead of the cumbersome notation for 
the distortions as Di i > Di 2 > . . . > Di x, we will simplify it to Di > D2 > ■ ■ ■ > D^. 

Next we choose a set of auxiliary random variables as specified in the theorem. More precisely, we define 
the auxiliary random variables in the alphabet S such that, 

Uk = Vk, Uk = Uk+i + Vk, k = l,2,...,K-l, (45) 

where V^'s are random variables in the alphabet S, which are independent of everything else. Furthermore 
the auxiliary random variables are chosen such that ^d{Uk) < -Dfc. For the moment this choice seems 
arbitrary, but we will link these auxiliary variables to the achievable distortions. 



"when computing throughput, we take a common hnk from a node to k nodes as k separate links; e.g., in Fig.|6]the common link of rate 
_Ri,3 is considered as two separate links of capacity _Ri,3 each. 
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Consider a JSCMMD code which induces the distortion vector {Di, D2, ■ ■ . , Dk) for source S, 
whose reconstructions are denoted as S'^, S^, . . . , S"^. We shall view the transition probability 
P(S'™, S*™, . . . , S']l\S'^'), as a broadcast channel and denote it as P^c. We need the following lemma to 
proceed, whose proof will be given shortly. For simplicity, we shall ignore the asymptotically small quantities 
such as (5, e in the previous section, which are inconsequential. 

Lemma 2: On the broadcast channel P^y^, the following degraded message set broadcast rates can be 
(asymptotically) supported 

Rl = IIS"" + ?7{"; - mC{Di, Ui) 

Rl = I(S^ + U^; S^\S^ + - mC(Dfc, U,), = 2, 3, . . . , (46) 

Moreover, these rates can be achieved by a random superposition code [l38ll based on the joint distribution 

Though this lemma is regarding the broadcast channel Phc, in a manner similar to the proof for general 
network unicast, we can in fact conclude that on the original network, when all the other encoders still 
perform the original joint source-channel encoding, the communication channel from source S to its 
destinations can support degraded message set broadcast rates {R\, R2, . . . , Rj^) per m source samples. 
This is because the broadcast channel Pbc is simply the communication channel in the original network 
with certain additional operations on the block level. 

Using this lemma, it is clear that together with the genie-provided communication network, we can send 
messages within the degraded message set from source Si to its destinations at rates per m-samples 

j^irn) ^ j^grn ^ jjm. ^n^j^m ^ j^m k = 2,3,...,K. (47) 

In other words, these degraded message set broadcast channel rates can be supported for pure channel 
coding purpose. 

Note that, however, if we use the distribution {S"^+U^, S"^+U^, . . . , S"^+U'^) to construct a successive 
refinement code lfT2l . the rates {R^^\r^^\ . . . , -R^^) are exactly the (asymptotic) source coding rates of 
this code per m-samples. Thus we can safely conclude that the distortion E.d{S + Ui — S) = ^d(Ui) < Di 
is achievable using the separation approach in this genie-aided network, because we can simply use the 
codewords generated by distribution + as the construction to achieve distortion W.d{Uk) < -Dfc- 

It remains to argue that if all the users simultaneously replace the original joint source-channel codes 
with the newly constructed channel codes, the rates that can be supported are still the same as above. This is 
indeed true, because in the superposition code in Lemma |2l we rely on the joint typicality on the block level 
when the channel input is of distribution S*". This however does not change if the other users replace the 
original joint source-channel codes also with their newly constructed channel codes, since the superposition 
channel codes constructed this way for all the broadcast channels indeed preserve joint typicality according 
to the distribution P{S'^, S^, . . . , S"^). This completes the proof with exception of Lemma [2l ■ 
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To prove Lemma [2l we first give an auxiliary lemma. 

Lemma 3: Let S™, f/[" and S'^ be specified as in the proof of Theorem [3l then we have for i = 
1,2,. ..,K 

I{S'^ + U^;S"')-I{S'^ + U^;SJ^) < mC{Di,Ui) (48) 
/(^"^ + f/r; ^"'l^'" + ?7fc"li) - /(S'" + f/r; ST\S'^ + ?7fc"li) < mC(A, f/fc), k = 2,...,K- 1,(49) 
/(5'" + f/^;^"*|^'" + t/^„i) - J(5'";^ri^'" + f/^_i) < mCiDi,UK), (50) 

where C(-, ■) is defined in (|36l ). 
Proof of Lemma 
We can write I{S"' + f/f ; 5f , S"^) in two ways 

+ f/f; 5™) = 1(5™ + f/f; + /(S'" + ^'"l^f), (51) 

and 

/(^r + f^r; S"^) = HS"" + t^f; S"^) + /(^"^ + t/f; SnS"") = HS"" + f/f; S""), (52) 

where 1(5™ + t/f ; ^™|^™) = /(f/f ; 4"*|^"*) = 0, because of the construction of the auxiliary random 
variable Ui ensures that f/[" is independent of (5™, 5""), as seen in (l45l) . 

Thus we have 

= + f/ri^n - if(f/r) 

< His"^ - sr + uD - H{ur) 

m 

< Y,H{S{j) - S,{j) + f/i(j)) - H{U^{j)) 
i=i 

m 

= - S.ij); S{j) - S^) + f/i(j)) 

<mC(A,t/i), (53) 

where (a) follows again since f/{" is independent of {S^,S^), and the last step follows the concavity of 
I{X; Y) as a function of the marginal distribution. This proves (l48l) in Lemma [H 

Note further that for k = 2,3, . . . , K, we have 

(54) 

= /(5™ + f/r;5'"|^'" + f/fc"li), (55) 
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as well as 

(56) 

It follows that 

/(^"^ + U^; S'^IS"' + Ur^,) - liS"" + U^; SnS"" + U^^^) = /(^"^ + U^; S^^lS^, S"" + U^^,) (57) 
Note that 

liS"" + U^; S'^lSr, S"" + U^_,) = HiS^'lSr, S"" + U^_,) - if(5'"|^r, S"" + f/f) (58) 

< H{S'''\Sr) - HiS'^lSr, S"^ + UJ!') (59) 
= liS'^-S"' + UJ!'\Sr) (60) 
= HiS"^ + UriSD - HiUJ!") (61) 

< HiS"" - Sr + - H{Ur) (62) 
<mC{Di,Uk), (63) 

where (6) follows because of the Markov string S"" + U'^_^ ^ S"" + UJJ" ^ S"" ^ 4"- This proves ^ 
in Lemma [3l 

Furthermore, because of the Markov string 5™ ^ S"" ^ S*™ + t/]^ o S"" + U^_^, we have 

= Hisns"" + f/^) - i/(^ri^™) 

= /(^r;^'"|^™ + f/^) > 0, (64) 

and it follows that 

< + t/^; ^™|^™ + t/™_i) - + f/™; SriS"" + t/™_i) < mC(A, Uk). (65) 

This proves (l50l) in Lemma [3l and thus the proof is complete. ■ 
Now we are ready to prove Lemma |2l 
Proof of Lemma |2l' 

We shall use the distribution (S""+t/™, S'^+U^, S'™+?7^„i, S"") to construct superposition broadcast 

channel code on the broadcast channel P^c for the degraded message se|l^. The rates (per length-m block) 

'^We assume the readers are familiar with the superposition coding argument in 1381 . 
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for these messages within the degraded message set are (asymptotically) 

Rl = li^S"" + U^;S'^) -mC{Di,Ui) (66) 
R^ = I{S"' + Ul^;S"'\S"' + UJ^,)-mCiDi,Ui), i = 2,3,...,K. (67) 

We need to show that the chosen rates can indeed be supported on this broadcast channel with a degraded 
message set. 

Since this channel itself is not degraded, we have to show that the superposition coding scheme succeeds 
for all the receivers. To see this, observe that for the z-th receiver, we have 

liS"" + Ul''; S"") - I{S''' + f/f; S^) < mC{D„ Ui), (68) 

by Lemma m It follows that 

7(5™ + f/{"; 5™) -Rl = + [/f; S^) - 1(5™ + f/™; 5™) + mC{Di, Ui) 

> mC{Di, Ui) - mC{Di, Ui) > 0, (69) 

where the last inequality is straightforward by noticing 

C{D,N) >C{D',N), (70) 

when D > D' . Thus the i-th receiver can indeed decode the first message for any i > 1. 
Similarly, we have for i > k 

= liS"" + f/^; SnS"" + f/r_i) - HS"" + f/r; ^'"l^'" + f/^i) + mC{Dk, Uk) 
>mCiDk,Uk)-mC{Di,Uk)>0, (71) 

and thus we conclude the z-th receiver can decode the messages 1,2, ... ,z. The A'-th receiver does not 
pose any additional difficulty. Thus indeed the rates specified in (|66l) -(l67l) can be supported on this super- 
broadcast channel, and the proof is complete. ■ 

VIL Concluding Remarks 

We considered the optimality of source-channel separation architecture in networks, and showed that the 
separation approach is optimal for the problems of distributed network joint source-channel coding and joint 
source-channel multiple unicast with distortion. Moreover, the separation approach is also approximately 
optimal for the problem of joint source-channel multiple multicast with distortion under certain distortion 
measures. The results in this work are obtained without explicit characterizations of the underlying regions. 
Such an approach of identifying properties without explicit individual component solutions is a valuable 
tool which may lead to further insights into network information theory problems. The source coding 
problem extracted from the distributed network source coding scenario implies that the interactive coding 
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aspect needs to be carefully incorporated into this source coding problem, which suggests a distinct line of 
research direction into network source coding. 

The requirement on the sources being independent in the general network unicast and multicast problem 
may seem rather stringent. However, in many practical situations the sources are indeed either independent 
or identical. Furthermore, the dependence structure among the sources can sometimes be approximated 
reasonably well by a model only allowing multiple independent or identical sources; one such example is 
in ^45^ . where an approximate characterization for the rate-distortion region of the three source Gaussian 
distributed source coding problem was given. Thus our results on general network unicast and multicast 
may have implications in even broader settings. 

For notational and conceptual simplicity, we made many assumptions which are not strictly necessary. 
The results can be straightforwardly extended to more general cases with some minimal efforts, the proofs 
of which are left to interested readers. 

• Distributed network joint source-channel coding: The synchronization requirement among sources 
can be removed, i.e., the source bandwidths do not have to be the same throughout the whole network. 
The multiple reconstructions of a source Si can be under different distortion measures; in fact the 
distortion measures defined can be defined on multiple sources, such as to reconstruct (81 — 82). The 
optimality result can be also extended to lossless reconstruction under vanishing block error probability 
requirement, instead of zero distortion requiremenjl^. 

• Joint source-channel multiple unicast with distortion: The synchronization requirement among 
sources and channels can be removed and the memoryless requirement on the channel can be relaxed 
to channels with finite memory (see [40] for an outline). As mentioned, the restriction on the discrete 
alphabets can be relaxed to sources and channels with continuous alphabets, where the sources satisfy 
the "bounded expected distortion" condition. The condition that each source is to be reconstructed at 
one destination can be relaxed to some extent. More precisely, when each source is to be reconstructed 
at multiple destinations but at the exact same distortion under the same distortion measure, then the 
source-channel separation architecture is still optimal. 

• Joint source-channel multiple multicast with distortion: Similar to the JSCMUD case, the 
synchronization, the memoryless channel, and the discrete alphabet requirement can be relaxed. 
The condition that each source is to be reconstructed under the same distortion measure can be 
relaxed to different distortion measures. In this case, the orders O among the reconstructions are not 
straightforward, however as we take all the possible order O, there is always one order that yield a valid 
approximation. If some of the reconstructions of a source 8i are specified to have the same distortion 
a priori, then the approximation upper bound can be improved; particularly, when the reconstructions 
of each source are specified to all have the same distortion, then source-channel separation is optimal, 

'^To see this, notice we in fact proved that the source vectors 5""'<" ) and reconstructions ) are strongly jointly typical, and if the 

original joint source-channel code guarantees vanishing length-m block error probability, then we only need to add an error correction code 
(treating each length-m block as a symbol) to boost the new code to have vanishing length-mn' block error probability. The rate loss of this 
error correction code is negligible. 



33 



which is essentially the generalization discussed above for the JSCMUD problem. 

For DNJSCC in a communication network without any relay or feedback, the orthogonal channels do 
not need to be synchronized with each other, and the source-channel separation architecture is still optimal. 
We believe the optimality result holds in more general networks without the synchronization requirement, 
however the interleaving argument becomes exceedingly complex. 

Our proof for DNJSCC relies on the (uniform) Markov lemma and thus applies only to discrete alphabets, 
however given the recent effort on extending the Markov lemma to abstract alphabets Ii461 (see also the 
generalized Markov Lemma to Gaussian sources in [i47il ). we believe it should be possible to extend our 
result on DNJSCC to the sources and channels in more general alphabets. 

The result on JSCMMD is given in a genie-aided-communication-network form, and this may be less 
pleasing. In some cases, such a bound can be translated to multiplicative bounds on the difference between 
the separation scheme distortion region and joint coding scheme distortion region. In ll30l . we provide 
such multiplicative bounds for the Gaussian source broadcast problem; in fact, two kinds of multiplicative 
bounds were provided in [|30l . one is a direct translation of Theorem HI and the other is a single constant 
bound which holds uniformly for all the components of the distortion tuples. The result in this work is 
presented in the current genie-aided form partly due to the abstract setting. For a more specific model in 
which bandwidths, power, and other resources are given, the genie-aided form can often be converted to 
other more explicit forms, such as the multiplicative distortion bound or a genie-aided-resource form; this 
is an interesting future research direction. 

One fascinating aspect of our results is the difference between the separation in the DNJSCC problem 
and that in the JSCMUD problem. From a philosophical point of view, in the point-to-point setting, the 
source and channel are specified by their statistical behaviors alone; however in the network setting, the new 
network components of the connectivity structure among nodes and the source-demand coding requirements 
are introduced. Our result in DNJSCC treats the source statistics and these network components as a whole, 
and treats the channel statistics as the other, resulting in the separation between a complex network source 
coding problem and multiple conventional point-to-point channel coding problems. In contrast, the result 
in JSCMUD treats the channel statistics and the network components as a whole, and the source statistics 
as the other, resulting in the separation between a complex network channel coding problem and multiple 
conventional point-to-point source coding problems. These separations are not the only possibility, and one 
can choose to separate in a different manner. Indeed, the work of Han ||25l suggests for lossless coding 
over network with orthogonal links without feedback, the joint coding problem can be separated without 
loss of optimality into a moderately complex network source coding problem (Slepian-Wolf coding) and a 
complex network channel coding problem. Thus the problem of source-channel separation is by no means 
solved, and it calls for further investigation. 
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